Getting Started with R

Who are we wrt R?

Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…

“great frustration and much suckiness…”

Working in R, RStudio

R is the computational engine; RStudio is the interface

Organizing R

For any new project in R, create an R project. Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory. If you never have to set the working directory at the top of the script, that’s a good thing!1

And create a system for organizing the objects in this project!

File Structure Example

R Packages

Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.

R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.

To use a package, install it once

  • You can install packages via point-and-click: Tools…Install Packages…Enter tidyverse (or a different package name) then click on Install.
  • Or you can use this command in the console: install.packages("tidyverse")

In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse).

R Basics

  • R is case sensitive
  • Everything in R is an object (vectors, lists, matrices, data frames)
  • # demarcates code comments
  • <- is the assignment operator, how we name new objects in the R environment

Reading in data

You can import pretty much any data format into R if you know the right command and (package):

  • CSV: read.csv (base R), read_csv (tidyverse)
  • Excel: read_excel (readxl)
  • Stata, SPSS, SAS: e.g., read.dta (foreign), read_dta (haven)
  • JSON, fixed-width, TXT, DAT, shape files, etc.

Primary data types include numeric, integer, logical, and character; plus factors.

Some initial R commands

Examining data: * names() * head() and tail() * str() and glimpse() * summary()

Some dplyr commands

Part of the the tidyverse, dplyr is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.

dplyr functions are for data frames.

  • first argument of dplyr functions is always a data frame
  • followed by function specific arguments that detail what to do

dplyr cheatsheet!

\(\color{blue}{\text{select()}}\) - extract \(\color{blue}{\text{variables}}\)

select() helpers include

  • select(.data, var1:var10): select range of columns
  • select(.data, -c(var1, var2)): select every column but
  • select(.data, starts_with(“string”)): select columns that start with… (or ends_with(“string”))
  • select(.data, contains(“string”)): select columns whose names contain…

\(\color{green}{\text{filter()}}\) - extract \(\color{green}{\text{rows}}\)

Logical tests Boolean operators for multiple conditions
x < y: less than a & b: and
y >= y: greater than or equal to a | b: or
x == y: equal to xor(a,b): exactly or
x != y: not equal to !a: not
x %in% y: is a member of
is.na(x): is NA
!is.na(x): is not NA

Pipes!

The pipe (%>%) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right. It allows us to call a series of functions in sequence (read the pipe as “and then…”).

dataframe %>% 
  filter(var1 > 0) %>% 
  select(var1, var2, var3) 

Keyboard shortcut to create %>%

  • Mac: cmd + shift + m
  • Windows: ctrl + shift + m

Let’s Play with R!

Click to download a zipped file. Store the unzipped folder on your computer where you can find it. It contains

  • a project file – learningR.Rproj – double click this to open an RStudio session that will look for and save files in learningR folder.
  • a data folder – data/ – that contains a small version of some of the data sets on which the second projects will rely
  • a scripts folder – scripts/ – that contains the script I’ll be showing today.
Artwork by @allison_horst

Artwork by @allison_horst


  1. Especially since no one seems to understand paths and directories any more.↩︎